A Corpus-Driven Study of the Variation of Co-Occurrence Patterns in Written and Spoken Registers
نویسندگان
چکیده
This paper will focus on the study of the variation of co-occurrence patterns encountered in written and spoken registers, through the analysis of a large lexical database of corpus-extracted multiword expressions (MWEs) of European Portuguese. Those MWEs were automatically extracted from a balanced 50 million word written corpus and a 1 million word spoken corpus, furthermore statistically interpreted using lexical association measures and partially manually validated in what concerns written units. MWEs have been and are still a challenge for linguistic analysis, lexicography and natural language processing due to their large pattern variation and to the need to put forward several linguistic levels for their analysis, namely in parameters like degree of syntactic cohesion, inflected variation and semantic compositional nature. In this paper, we aim to revise some typologies of MWEs using a corpus-driven approach, to analyse corpus findings and their relation to MWEs categorization, and to establish possible contrastive registers based on syntactic, functional and semantic paradigms: for example, contrasts involving spoken and written texts or contrasts involving the degree of formality taken transversally in both registers. By presenting register-specific co-occurrence patterns based on authentic data, this study will hopefully contribute to the more general categorization of MWEs in Portuguese. 1 e-mail: [email protected]
منابع مشابه
Semantic processing survey of spoken and written words in adolescents with cerebral palsy: Evidence from PALPA word-picture matching test
Objective: The present study aimed to assess and compare semantic processing of spoken and written words in adolescents with cerebral palsy and healthy adolescents. Method: The present study is quantitative in terms of type and experimental in terms of method. Examination Group consisted 30 adolescents with cerebral palsy aged 10 to 15 years were selected by convenience sampling method. All of ...
متن کاملVariation in Language and Cohesion across Written and Spoken Registers
This paper investigates the variation in cohesion across written and spoken registers. The same method and corpora were used as in Biber’s (1988) study on linguistic variation across speech and writing; however instead of focusing on 67 linguistic features that primarily operate at the word level, we compared 236 language and cohesion features at the textlevel. Variations in frequencies across ...
متن کاملThe Assessment of Pragmatic Knowledge in the Online General IELTS-Practice Resources: A Corpus Analysis of Writing Tasks
Motivated by the concept of Communicative Language Ability and the eminence of the IELTS exam, this study intended to scrutinize the representation of functional knowledge (FK) and socio-linguistic knowledge (SK) as sub-components of pragmatic knowledge in the writing performances of both tasks of the online General IELTS-practice resources across three band scores. This quantitative inter-scor...
متن کاملThe Effect of CMC in Business Emails in Lingua Franca: Discourse Features and Misunderstandings
The paper argues that everyday exchange of business emails produces a development in the work-group relationship, which, in turn, makes new communication styles possible and acceptable by the users' habit to computer-mediated forms, even in unbalanced professional exchanges. The focus is on the (spoken) discourse features of email messages in a self-compiled corpus of selected computer-mediated...
متن کاملACADEMIC WRITING REVISITED: A PHRASEOLOGICAL ANALYSIS OF APPLIED LINGUISTICS HIGH-STAKE GENRES FROM THE PERSPECTIVE OF LEXICAL BUNDLES
Lexical bundles are frequent word combinations that commonly appear in different registers. They have been the subject of much research in the area of corpus linguistics during the last decade. While most previous studies of bundles have mainly focused on variations in the use of these word combinations across different registers and a number of disciplines, not much research has been done to e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007